compositional explanation
- North America > United States (0.14)
- Europe > Sweden (0.05)
Towards a fuller understanding of neurons with Clustered Compositional Explanations
Compositional Explanations is a method for identifying logical formulas of concepts that approximate the neurons' behavior. However, these explanations are linked to the small spectrum of neuron activations (i.e., the highest ones) used to check the alignment, thus lacking completeness. In this paper, we propose a generalization, called Clustered Compositional Explanations, that combines Compositional Explanations with clustering and a novel search heuristic to approximate a broader spectrum of the neuron behavior. We define and address the problems connected to the application of these methods to multiple ranges of activations, analyze the insights retrievable by using our algorithm, and propose desiderata qualities that can be used to study the explanations returned by different algorithms.
Compositional Explanations of Neurons
We describe a procedure for explaining neurons in deep representations by identifying compositional logical concepts that closely approximate neuron behavior. Compared to prior work that uses atomic labels as explanations, analyzing neurons compositionally allows us to more precisely and expressively characterize their behavior. We use this procedure to answer several questions on interpretability in models for vision and natural language processing. First, we examine the kinds of abstractions learned by neurons. In image classification, we find that many neurons learn highly abstract but semantically coherent visual concepts, while other polysemantic neurons detect multiple unrelated features; in natural language inference (NLI), neurons learn shallow lexical heuristics from dataset biases. Second, we see whether compositional explanations give us insight into model performance: vision neurons that detect human-interpretable concepts are positively correlated with task performance, while NLI neurons that fire for shallow heuristics are negatively correlated with task performance. Finally, we show how compositional explanations provide an accessible way for end users to produce simple copy-paste adversarial examples that change model behavior in predictable ways.
Refining Language Models with Compositional Explanations
Pre-trained language models have been successful on text classification tasks, but are prone to learning spurious correlations from biased datasets, and are thus vulnerable when making inferences in a new domain. Prior work reveals such spurious patterns via post-hoc explanation algorithms which compute the importance of input features. Further, the model is regularized to align the importance scores with human knowledge, so that the unintended model behaviors are eliminated. However, such a regularization technique lacks flexibility and coverage, since only importance scores towards a pre-defined list of features are adjusted, while more complex human knowledge such as feature interaction and pattern generalization can hardly be incorporated. In this work, we propose to refine a learned language model for a target domain by collecting human-provided compositional explanations regarding observed biases. By parsing these explanations into executable logic rules, the human-specified refinement advice from a small set of explanations can be generalized to more training examples. We additionally introduce a regularization term allowing adjustments for both importance and interaction of features to better rectify model behavior. We demonstrate the effectiveness of the proposed approach on two text classification tasks by showing improved performance in target domain as well as improved model fairness after refinement.
Guaranteed Optimal Compositional Explanations for Neurons
La Rosa, Biagio, Gilpin, Leilani H.
While neurons are the basic units of deep neural networks, it is still unclear what they learn and if their knowledge is aligned with that of humans. Compositional explanations aim to answer this question by describing the spatial alignment between neuron activations and concepts through logical rules. These logical descriptions are typically computed via a search over all possible concept combinations. Since computing the spatial alignment over the entire state space is computationally infeasible, the literature commonly adopts beam search to restrict the space. However, beam search cannot provide any theoretical guarantees of optimality, and it remains unclear how close current explanations are to the true optimum. In this theoretical paper, we address this gap by introducing the first framework for computing guaranteed optimal compositional explanations. Specifically, we propose: (i) a decomposition that identifies the factors influencing the spatial alignment, (ii) a heuristic to estimate the alignment at any stage of the search, and (iii) the first algorithm that can compute optimal compositional explanations within a feasible time. Using this framework, we analyze the differences between optimal and non-optimal explanations in the most popular settings for compositional explanations, the computer vision domain and Convolutional Neural Networks. In these settings, we demonstrate that 10-40 percent of explanations obtained with beam search are suboptimal when overlapping concepts are involved. Finally, we evaluate a beam-search variant guided by our proposed decomposition and heuristic, showing that it matches or improves runtime over prior methods while offering greater flexibility in hyperparameters and computational resources.
- North America > United States > California (0.14)
- Europe > Sweden (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- Asia (0.04)
- Government > Regional Government > North America Government > United States Government (0.46)
- Government > Military (0.46)
- Education (0.46)
Review for NeurIPS paper: Compositional Explanations of Neurons
Summary and Contributions: This paper presents a quite thoughtful and informative attempt to understand what information is encoded in individual neurons and sets of neurons (by which the authors mean units in neural networks, not real brain neurons. This is a somewhat unfortunate use of the word and it might be helpful in the future to talk on neural network units or some other term instead.) It is by now clear to anyone working with these networks that each individual neuron, esp. at the lower and intermediate levels, encodes not some simple human-understandable feature (like color red, size large, wordclass noun, type Person), but rather a complex combination of what one could call sub-facets, which each by itself often not easily described to a human. Combinations of these sub-facets taken from different neurons acting in tandem JOINTLY encode the facets that are more accessible to humans. But of course the other sub-facets also encoded by the neurons present in a human-accessible feature cluster might encode [parts of] a variety of totally unrelated other features, with the result that simple hotspot analysis and similar highlighting techniques are never fully determinate or clear, but always rather ambiguous and'smeared'.
Review for NeurIPS paper: Compositional Explanations of Neurons
The reviewers were quite positive about this paper, which works to automatically explain the behavior of neurons in deep networks using compositionality. These results will be of interest to both those in NLP and computer vision. The reviewers raised a few points for clarification (requiring labeled data, choice of tasks), which were handled in the rebuttal and the authors state they will incorporate these caveats and clarifications into the final version and/or supplementary material.
Towards a fuller understanding of neurons with Clustered Compositional Explanations
Compositional Explanations is a method for identifying logical formulas of concepts that approximate the neurons' behavior. However, these explanations are linked to the small spectrum of neuron activations (i.e., the highest ones) used to check the alignment, thus lacking completeness. In this paper, we propose a generalization, called Clustered Compositional Explanations, that combines Compositional Explanations with clustering and a novel search heuristic to approximate a broader spectrum of the neuron behavior. We define and address the problems connected to the application of these methods to multiple ranges of activations, analyze the insights retrievable by using our algorithm, and propose desiderata qualities that can be used to study the explanations returned by different algorithms.
Compositional Explanations of Neurons
We describe a procedure for explaining neurons in deep representations by identifying compositional logical concepts that closely approximate neuron behavior. Compared to prior work that uses atomic labels as explanations, analyzing neurons compositionally allows us to more precisely and expressively characterize their behavior. We use this procedure to answer several questions on interpretability in models for vision and natural language processing. First, we examine the kinds of abstractions learned by neurons. In image classification, we find that many neurons learn highly abstract but semantically coherent visual concepts, while other polysemantic neurons detect multiple unrelated features; in natural language inference (NLI), neurons learn shallow lexical heuristics from dataset biases.